Compilation and Communication Strategies for Out-of-Core Programs on Distributed Memory Machines
نویسندگان
چکیده
It is widely acknowledged that improving parallel I/O performance is critical for widespread adoption of high performance computing. In this paper, we show that communication in out-of-core distributed memory problems may require both inter-processor communication and le I/O. Thus, in order to improve I/O performance, it is necessary to minimize the I/O costs associated with a communication step. We present three methods for performing communication in out-of-core distributed memory problems. The rst method called the generalized collective communication method follows a loosely synchronous model; computation and communication phases are clearly separated, and communication requires permutation of data in les. The second method called the receiver-driven in-core communication considers only communication required of each in-core data slab individually. The third method called the ownerdriven in-core communication goes even one step further and tries to identify the potential future use of data (by the recipients) while it is in the sender's memory. We describe these methods in detail and present a simple heuristic to choose a communication method from among the three methods. We then provide performance results for two out-of-core applications, the two-dimensional FFT code and the twodimensional elliptic Jacobi solver. Finally, we discuss how the out-of-core and in-core communication methods can be used in virtual memory environments on distributed memory machines. Compilation and Communication Strategies for Out-of-core programs on Distributed Memory Machines Rajesh Bordawekar Alok Choudhary Electrical and Computer Engineering Department 121, Link Hall, Syracuse University, Syracuse, NY 13244 rajesh, [email protected] URL: http://www.cat.syr.edu/~frajesh,choudharg J. Ramanujam ECE Dept., Louisiana State University, Baton Rouge, LA 70803 [email protected] URL: http://www.ee.lsu.edu/jxr/jxr.html
منابع مشابه
Compilation Techniques for Out-of-Core Parallel Computations
The difficulty of handling out-of-core data limits the performance of supercomputers as well as the potential of the parallel machines. Since writing an efficient out-of-core version of a program is a difficult task and virtual memory systems do not perform well on scientific computations, we believe that there is a clear need for compiler directed explicit I/O approach for out-of-core computat...
متن کاملEecient Compilation of Out-of-core Data Parallel Programs Eecient Compilation of Out-of-core Data Parallel Programs
Large scale scientiic applications, such as the Grand Challenge applications, deal with very large quantities of data. The amount of main memory in distributed memory machines is usually not large enough to solve problems of realistic size. This limitation results in the need for system and application software support to provide eecient parallel I/O for out-of-core programs. This paper describ...
متن کاملData Access Reorganizations in Compiling Out-of-Core Data Parallel Programs on Distributed Memory Machines
This paper describes optimization techniques for translating out-of-core programs written in a data parallel language like HPF to message passing node programs with explicit parallel I/O. We rst discuss how an out-of-core program can be translated by extending the method used for translating in-core programs. We demonstrate that straightforward extension of in-core compilation techniques does n...
متن کاملParallelization of Irregular Codes Including Out-of-Core Data and Index Arrays
This paper describes techniques for implementing irregular out-of-core codes on distributed memory machines. These codes involve data arrays and other data structures that are too large to t in main memory; so data needs to be stored on disks and fetched during the execution of the program. The eecient use of disk storage is a critical factor that determines the performance of these application...
متن کاملSynonyms Parallel Communication Models Message-passing Performance Models
Bandwidth-latency models are a group of performance models for parallel programs that focus on modeling the communication between the processes in terms of network bandwidth and latency, allowing quite precise performance estimations. While originally developed for distributed-memory architectures, these models also apply to machines with non-uniform memory access (NUMA), like the modern multi-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Parallel Distrib. Comput.
دوره 38 شماره
صفحات -
تاریخ انتشار 1996